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Abstract — Explicit non-asymptotic upper bounds on the 
sizes of multiple-deletion correcting codes are presented. In 
particular, the largest single-deletion correcting code for q- 
ary alphabet and string length n is shown to be of size at 
roost (^zrjj^^^rry An improved bound on the asymptotic rate 
function is obtained as a corollary. Upper bounds are also 
derived on sizes of codes for a constrained source that does 
not necessarily comprise of all strings of a particular length, 
and this idea is demonstrated by application to sets of run- 
length limited strings. 

The problem of finding the largest deletion correcting code 
is modeled as a matching problem on a hypergraph. This 
problem is formulated as an integer linear program. The 
upper bound is obtained by the construction of a feasible 
point for the dual of the linear programming relaxation of 
this integer linear program. 

The non-asymptotic bounds derived imply the known 
asymptotic bounds of Levenshtein and Tenengolts and im- 
prove on known non-asymptotic bounds. Numerical results 
support the conjecture that in the binary case, the Varshamov- 
Tenengolts codes are the largest single-deletion correcting 
codes. 

Index Terms — Deletion channel, multiple-deletion correct- 
ing codes, single-deletion correcting codes, non-asymptotic 
bounds, hypergraphs, integer linear programming, linear 
programming relaxation, Varshamov-Tenengolts codes. 

I. Introduction 

A deletion channel is a communication channel that takes 
a string of symbols as its input and transmits only a subset 
of the input symbols leaving the order of the symbols 
unchanged. Symbols that are not transmitted constitute the 
errors in the channel and are called deletions. A deletion 
channel is distinct from the widely studied erasure channel 
wherein the positions of the errors are known. This paper 
mainly concerns deletion channels where the maximum 
number of deletions, denoted s, is fixed. 

A codebook or a deletion correcting code for the deletion 
channel is a set C of input strings, no two of which on 
transmission through the channel can result in the same 
output. For a string x, call the set of strings obtained by 
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deletion of s symbols from x, the s-deletion set of x. An 
s-deletion correcting code is thus a set of input strings with 
pairwise disjoint s-deletions sets. 

To explain our contribution, consider the case where 
s = 1 (the SOTg/e-deletion channel). An open problem 
pertaining to this channel is the determination of the size of 
the largest or optimal codebook C = C*, for input strings 
comprising of all strings of length n [1]. The classical 
bound of Levenshtein [2] provides one benchmark for 
optimality. For the case of binary strings, Levenshtein [2] 
showed that the size |C*| of an optimal codebook for the 
single-deletion channel is asymptotically at most It is 
important to note here the sense in which this asymptoticity 
is being defined. A function / : N — >^ M is said to 
be asymptotically less than or equal to another function 
.9 : N ^ M, written / < g, if lim„^eo ^ < I- f is 
said to be asymptotically equal to g, written / ~ 5, if 

f % g and g % f. Thus Levenshtein's result says that 

\c* I 

lim„_j.oo ^ 1- Levenshtein then constructs a codebook 
of size at least thereby proving ^ < |C*|, and hence 
concludes that the optimal codebook C* has size asymptot- 
ically equal to i.e. C* satisfies lim„_>oo ^ff^ = 1. 

If the function g is bounded, the asymptotic equality 
f ^ g implies equality of the hmiting values of f{n) and 
g{n) or their near-equality for sufficiently large n. However 
since g{n) = 2"/n is unbounded, Levenshtein's asymptotic 
results do not allow one to obtain a fine approximation 
to \C*\, or conclude if for a particular n, \C*\ is greater 
or less than or even conclude the boundedness or 
unboundedness of the difference \\C*\ — ^|. Indeed, the 
best known codes for the binary version of this channel, 
the Varshamov-Tenengolts (VT) codes [3], are of size at 
least for input length n. Although this sequence is 
asymptotically equal to — (and recently verified by exact 
search to be optimal for string lengths n < 10 [4]), the 
difference ^ — grows to infinity. 

In other words, for this problem, asymptotic optimality 
of a codebook does not say much about its optimality per 
se. The challenges noted above continue to hold (and are 
perhaps more severe) for larger alphabet and larger number 
of deletions. For the case of multiple deletions, asymptotic 
bounds exist, thanks to Levenshtein [2] for binary alphabet, 
but little is known about the quality of these bounds, since 
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no matching lower bounds exist. A more useful bound for 
any such channel would be a non-asymptotic upper bound 
that also implies known asymptotic bounds. Such a bound 
can serve as a hard bound on the size of a codebook for any 
string length and help in assessing the quality of specific 
code constructions. Such non-asymptotic upper bounds are 
the subject of this paper. 

We derive expUcit non-asymptotic upper bounds on the 
sizes of codebooks for any number of deletions s and any 
alphabet size q. These bounds imply the known asymptotic 
bounds of Levenshtein [2] and generalize them to larger 
alphabet. For the case of a single deletion we obtain this 
bound in closed form. We show that for string length 
n, an optimal q-ary single-deletion codebook has size at 
most • This implies the asymptotic upper bound 

of ^^q'Li)n shown by Tenengolts [5]. In the binary case, 
together with the size of the VT codes (which effectively 
provide non-asymptotic lower bounds), our upper bound 
\Ey implies Levenshtein's asymptotic results. 

From these bounds we derive an upper bound on the 
asymptotic rate function. For a channel where the number 
of deletions is a constant fraction of string length, this 
function gives the asymptotic value of the rate of the largest 
deletion correcting code, as a function of the fraction of 
symbols that are deleted. This bound on the rate function 
improves on the previous bound shown by Levenshtein [6]. 

We then extend this methodology to derive bounds on 
deletion correcting codes for constrained sources . These are 
codebooks for a specific set of strings, i.e., not necessarily 
the set of all strings of a particular length. Recording 
systems such as magnetic tapes impose physical constraints 
on the patterns that symbols can take in codewords [7]. If 
such a code is subsequently transmitted through a deletion 
channel, the codewords can be thought of as a constrained 
source. As a specific demonstration of this idea, we derive 
non-asymptotic upper bounds on sizes of codebooks for 
run-length limited sources for the single-deletion channel. 

The bounds are obtained as follows. We characterize the 
largest codebook for the deletion channel as a maximum 
matching on a suitably defined hypergraph. The problem 
of finding a maximum matching is written as a 0-1 integer 
linear program. The. fractional matching on this hypergraph 
is the solution of the linear programming relaxation of this 
integer linear program, and its value is an upper bound on 
the size of the maximum matching. Our upper bound is 
obtained by constructing a feasible solution for the dual 
of this linear program. For the single-deletion channel the 
construction is such that it allows for the calculation of the 
dual objective in closed form as • Unfortunately, 

for larger number of deletions, due to the complicated na- 
ture of the resulting expressions, we are unable to produce 
closed form expressions. 



Computations on a computer reveal that for the binary 
single-deletion channel the optimal fractional matching size 
is quite close to the size of the VT codes. For strings of 
length up to 14, the difference between the size of the VT 
codes and the optimal fractional matching is at most 8; 
this indicates that the VT codes are either optimal or very 
close to being optimal (at least up to string length 14). 
On a side note, the hypergraph approach also appears to 
be more amenable to algorithmic approaches due to its 
compact representation; this aspect of this paper may be 
of independent interest. 

A. Related work 

A wide-ranging survey on various results and challenges 
associated with deletion correction and its variants was 
recently presented by Mercier et al. [8]. Sloane's survey [1] 
deals specifically with the binary single-deletion channel 
and illuminates several deep open questions pertaining to 
the VT codes. Here we recall some highlights from this 
area of work. 

The study of the deletion channel has a long history 
going back at least to the seminal work of Levenshtein [2] 
wherein asymptotic bounds on the sizes of optimal binary 
codebooks were derived. For s deletions and binary input 
strings, Levenshtein [2] showed that the largest codebook 
C2 an for String length n satisfies the asymptotic relations 



Levenshtein [2] also noticed that the Varshamov-Tenengolts 
codes [3], which were proposed for asymmetric error 
correction, served as asymptotically optimal codes for the 
binary single-deletion channel; these remain to date the 
best known codes and have recently been confirmed to be 
optimal for string length up to 10. An independent line of 
study on this topic appears to have been contemporaneously 
pursued by Ulhnan [9], [10]. 

Thereafter there have been many efforts at code construc- 
tion. An attempt at generahzing the VT codes for the binary 
multiple-deletion channel was made by Helberg and Fer- 
reira [11]; that this generalization indeed corrects deletion 
errors was recently shown by Abdel-Ghaffar et al. [12]. 
For non-binary alphabet this problem was first studied 
by Calabi and Harnett [13] and Tanaka and Kasai [14]. 
Later Tenengolts proposed a construction similar to the 
VT codes for the q-axy single-deletion channel and showed 
that the optimal codebook for string length n, C* is 
of size at least and satisfies the asymptotic upper 

bound |C* i „| < {q-i)n t^J- Interestingly, no asymptotic 
bounds for g'-ary s-deletion correcting codes appear to have 
been explicitly articulated, though Levenshtein's original 
proof from [2] seems extendable to g-ary strings. The VT 
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codes are number-theoretic and the underlying number- 
theoretic logic was generaUzed to correct larger number 
of asymmetric errors by Varshamov [15]. 

Butenko et al. attempted to find codes algorithmically 
by casting this problem as a maximum independent set 
problem on a class of graphs [16]. Schulman and Zuck- 
erman considered a construction that is in part algorithmic 
and showed the existence of 'asymptotically good' codes 
for deletions whose number increases proportionally to the 
length of the string [17]. More recently, the algorithmic 
approach has been pursued by Khajouei et al. [18] and 
a graph coloring based approach was studied by Cullina 
et al. [19]. Finding codes for the deletion channel, either 
algorithmically or through a number-theoretic construction, 
is a considerable challenge, as evidenced by the attempts at 
achieving the records for largest codebooks on the webpage 
maintained by Sloane [4]. 

Deletion errors have also been studied for run-length lim- 
ited sources - which we consider in this paper as a example 
of a constrained source - by Roth and Siegel [20], Hilden et 
al. [21] and Bours [22], amongst others. However in these 
works, the deletion errors considered have a specific pattern 
and do not exactly correspond to the deletion channel we 
consider. Exceptions to this are the recent works of Cheng 
et al. [23] and Paluncic et al. [24] which consider codes 
for run-length limited sources for the deletion channel in 
its full generality. 

The topic of deletion errors has spawned research on 
related questions, such as the existence of 'perfect codes' 
(Levenshtein [25]), and the combinatorial problems of 
counting subsequences (e.g., Hirschberg and Regnier [26], 
Swart and Ferreira [27], Mercier et al. [28] and more 
recently, Liron and Langberg [29]) and the reconstruction of 
sequences (Levenshtein [30], [31]). Another body of active 
ongoing research studies the capacity of the deletion chan- 
nel (e.g., Mitzenmacher [32], Kanoria and Montanari [33], 
and Diggavi et al. [34]). 

The question of non-asymptotic upper bounds, which is 
our interest, is comparatively less studied. One may scan 
Levenshtein 's proof of the asymptotic bound from [2] to 
see if a non-asymptotic bound has been found in it as 
an intermediate step. For the single deletion channel, the 
bound so discovered (see Sloane's proof [1, Theorem 2.5]) 



is greater than 



(for binary alphabet) which is 



n—2^n log n 

clearly weaker than our bound. In fact, Levenshtein [6] has 
presented a somewhat more general bound on the size of a 
g-ary s-deletion correcting code: 



< 



1 



(2) 



Es /r-s+V\ 
i=0 \ i ) j=0 

where r is any integer satisfying 1 < s < r -|- 1 < n. It is 
not clear which value of r provides the strongest bound 
of these (although a heuristic argument using Stirling's 



approximation suggests that r ~ should be optimal in the 
binary single-deletion case; this is essentially Levenshtein's 
original argument [2]). We have found via numerical cal- 
culation that the strongest of the bounds in (2) is weaker 
than our bound. Additionally, our bound in the single- 
deletion case also has the attractiveness of being in closed 
form. Levenshtein in another paper derives another non- 
asymptotic bound for the size of a g-ary single-deletion 
codebook [25, Theorem 5.1], 



Q, l.n 



< 



+ {n- 2)g"-2 + q 



(3) 



but this bound is asymptotically much weaker than Tenen- 
golts' asymptotic bound of (^gli^^ (their ratio grows to 
infinity; our bound implies Tenengolts' asymptotic bound). 
Sloane's website [4] contains several numerical bounds 
found by calculating the Lovasz i? [35] on certain graphs. 
But unlike our bounds, there are no expressions (closed 
form or otherwise) for these bounds. 

The scarcity of non-asymptotic upper bounds is perhaps 
due to the property that deletion sets of distinct strings can 
have distinct sizes. This point has also been stressed by 
Sloane [1, Section "Optimality"]: "It is more difficult to 
obtain upper bounds for deletion-correcting codes than for 
conventional error-correcting codes, since the disjoint balls 
De{u) (deletion sets) associated with the codewords ... do 
not all have the same size. Furthermore the metric space 
(V2,d)^ is not an association scheme and so there is no 
obvious linear programming bound." In the light of this 
comment it is interesting that our non-asymptotic bound is 
obtained from a linear programming argument, and it rehes 
critically on the sizes of the deletion sets. 



B. Organization 

This paper is organized as follows. Section II comprises 
of preliminaries including, notation, problem definition, 
background on hypergraphs and the derivation of lemmas 
that are of use in our analysis. Section HI contains the 
hypergraph characterization of the optimal codebook and 
the derivation of the upper bounds for single-deletion 
correcting codes. In Section IV we extend the analysis to 
obtain bounds on codes for larger number of deletions and 
derive a bound on the asymptotic rate function. In Section 
V, we derive bounds on codebooks for constrained sources, 
in particular, for run-length limited sources. Numerical 
simulations comparing the values of Levenshtein's bound 
from (2), our bound, the tightest bound obtainable by our 
logic, and the best known codes are presented in Section VI. 
In Section Vll we discuss our results and possible avenues 
for tightening our bound and conclude the paper. 

' d is the Levenshtein or edit distance, cf. Definition 2.4. 
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II. Preliminaries 

Let = {0, 1, . . . , q- 1} be a q-sny alphabet and let F;^' 
denote the set of all g-ary sequences of length n. Any such 
5-ary sequence is called a string. We let F* = U^o 
denote set of all strings; here F° denotes the empty string. 
Let X = Xi . . . a;„ be a string. A subsequence of x is formed 
by taking a subset of the symbols of x and aligning them 
without altering their order. In other words, a subsequence 
of X is a sequence y = xi^ . . . xi^, where 1 < fc < n and 
the indices satisfy l<ii<...<«fe<n;a;is called a 
supersequence of y. We say that y is obtained from x by 
the deletion of n — k symbols and x is obtained from y by 
the insertion of n — k symbols. 

A specific type of subsequence that is important for our 
results is a run, defined below. 

Definition 2.1: Let x = xi . . .Xn G F" be a string. A 
run of X is a maximal contiguous subsequence with identi- 
cal symbols, i.e. a run of x is a sequence XiXi^i . . . Xj+j, 
1 < i < 1+7 < n with the property that x^ = Xj+i = . . . = 
Xj+j and the properties that, a) if 1 < i then Xi_i ^ Xj, 
and b) if i + J < n, then Xj+j ^ Xj+j+i. For any x e F*, 
r(x) denotes the number of runs of x. 
For example if g = 3 and x = 120010, the runs of x 
are 1,2,00,1,0 and r(x) = 5. Clearly for any x e F^, 
1 < r(x) < n. 

Definition 2.2: For any string x G F*, the set of subse- 
quences of X obtained by deletion of s symbols is denoted 
by Ds{x) and set of supersequences obtained by insertion 
of s symbols into x is denoted by Is{x). We call Ds{x) 
and Is{x) the s-deletion set of x and s-insertion set of x, 
respectively. 

For example if q = 3,s = 1 and x = 120010, then 
Di{x) = {20010,10010,12010,12000,12001}. Notice 
that subsequences obtained by the deletion of a symbol 
from the same run of x are all identical. For example, in the 
run 00, deletion of either results in the same subsequence 
12010. Consequently we have the following relation [25], 

|£>i(x)| =r(x), VxeF^. (4) 

For s > 1, expressions for |Z)j;(.x)| get increasingly compli- 
cated, and depend on statistics of x other than the number of 
runs (see, e.g., [28] for one set of expressions). We discuss 
bounds on |-D,,(-)| later in Section IV. 

Surprisingly, the size of I six) is independent of x, but 
is a fimction only of the length of x and the size of the 
alphabet [36, Lemma 1, p. 354]. Specifically, we have 

\Is{x)\ = E('')(9-1)' VxeF^-^ (5) 

We denote this quantity by Lq^s.n, 

W = E(")(9-1F- (6) 



As a general rule, instead of using '1-deletion' or '1- 
insertion' (correcting code, set,. . .), we use the more elegant 
's;ngZe-deletion' (correcting code, set, . . .) etc. 

The central object of our interest, namely, a deletion 
correcting code is defined below. 

Definition 2.3: A s-deletion correcting code (or "s- 
deletion codebook") for string length n and alphabet Fg is 
a set C C F^ with the property that the sets Da{x),x e C, 
are pairwise disjoint. The largest such code is denoted by 
C* g „ and called an optimal s-deletion correcting code or 
optimal s-deletion codebook. 

A code capable of correcting s deletions is also capable 
of correcting a total of s insertions and deletions [2], 
whereby an s-deletion correcting code is also a s-insertion 
correcting code (i.e., a set C C F^ such that the sets 
/s(x),x G C, are pairwise disjoint) [2]. Another charac- 
terization of single-deletion correcting codes is through the 
Levenshtein distance. 

Definition 2.4: For any x, y G F* define the Levenshtein 
distance or edit distance d{x,y) as minimum number of 
insertions or deletions required to obtain x from y. 
A set C C Fg is a s-deletion correcting code if and only 
if d{x,y) > 2s for any two distinct strings x,y £ C. In 
sunmiary, we have the following equivalence [2]. 

Lemma 2.1: For any G F^, the following three 
statements are equivalent. 

1) d{x,y) < 2s, 

2) D,(x)nD,(y) ^0, 

3) 7,(x)n7,(y)7^0. 

The following lemma, although not directly related to 
deletion correction, will be reqiured for our analysis. 

Lemma 2.2: Let n,k,d G N, fc < n,dk < n and let 
ti, . . . ,tk be variables taking values in N. The number of 
solutions [ti,. . . ,tk) to the set of equations 

fe 

^ti = n, ti>d,ti€n,y l<i<n, (7) 

i=l 

Proof: First suppose d = 1. Consider an array of n 
I's and insert fc — 1 O's between the I's, so that no two O's 
are inserted next to each other and no O's are inserted at 
the beginning or the end of the array. There is a one-to-one 
correspondence between an arrangement of this kind and 
a solution of (7): ti, for 1 < i < k, corresponds to the 
number of I's between the {i — 1)"^ and i*'^ and ti, tk 
are the number of I's at the beginning and the end of the 
array. The number of such arrangements is easily seen to 

Now suppose d > 1. Notice that the system (7) is 
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equivalent to the system 

k 

Y,{ti-{d-l)) = n-k{d-l), 
1=1 

{ti - {d-l)) > l,ti - (d- 1) e N,V 1 < i < n. 

This system reduces to the earUer case with d= 1, but with 
variables t\ = ti — {d — 1), for i = \, . . . ,k. The number 
of solutions in this case is (""''^'^J"/^"^). ■ 



A. Background on hypergraphs 

The contents of this section are sourced from Berge [37]. 

A hypergraph is a generalization of the concept of 
a graph. In a graph edges are pairs of vertices. In a 
hypergraph, one allows arbitrary nonempty sets of vertices, 
including those with exactly one element, to be the so- 
called hyperedges. Formally, 

Definition 2.5: A hypergraph "H is a tuple (X, £), where 
X is a finite set and f is a collection of nonempty subsets 
of X such that (J^es E = X. X is called the vertex set, 
its elements are called vertices and the elements of £ are 
called hyperedges. 

When a vertex belongs to a hyperedge, we say it is covered 
by the hyperedge. The above definition assumes that the 
hypergraph contains no exposed vertex, i.e., a vertex that 
is covered by no hyperedge. This is a matter of convention; 
other definitions, e.g. [38], do not impose this requirement. 

Let £ = {Ei,...,Em} be the set of hyperedges 
of the hypergraph % = {X,£). For a set of indices 
J C {1, . . . , m}, the partial hypergraph generated by J 
is %., = (Xj, {E^\j G J}), where Xj = [j.^j Ej. 

Hyperedges are defined as sets and as such one can talk 
of intersection of hyperedges. Specifically, two hyperedges 
are disjoint if there is no vertex that is covered by both 
hyperedges. The idea of packing neighborhoods or spheres 
used in coding theory sits naturally in the theory of hyper- 
graphs. A packing of hyperedges is called a matching. 

Definition 2.6: A matching of a hypergraph H = {X, £) 
is a collection of pairwise disjoint hyperedges Ei,. . . ,Ej e 
£. The matching number of H, denoted fCH), is the largest 
j for which such a matching exists. 
A dual concept (in a sense we make precise below) of a 
matching is a transversal. 

Definition 2.7: A transversal of a hypergraph T-L — 
{X,£) is a subset T <Z X that intersects every hyperedge 
in £. The transversal number of %, denoted t{H), is the 
smallest size of a transversal. 

Suppose % = {X,£) is a hypergraph with n vertices 
Xi,...,Xn and m hyperedges Ei,...,Em- Consider a 
matrix A e {0, where the element in the i*'' row 



and j column is 

I otherwise. 

A is called the incidence matrix of H. The matching 
number and the transversal number are both solutions of 
integer linear programs. In the rest of this paper, we refer 
to problem (8) below as the matching problem and (9) as 
the transversal problem on hypergraph %. 

Lemma 2.3: The matching number and transversal num- 
ber are solutions of integer linear programs: 

fin) = max{l^z\ Az < 1, Zj e {0, 1}, 1 < j < to}, 

(8) 

t{H) = m.in{l'^w\A'^w > 1, G {0, 1}, 1 < i < n}, 

(9) 

where 1 denotes a column vector of all I's of appropriate 
dimension. 

Proof: In the integer linear programming formulation 
of the matching problem, each hyperedge Ej G £ cor- 
responds to a variable zj E {0, 1} and z is the vector 
{zi, . . . , Zm)- The variable zj is interpreted as the indicator 
function that identifies if hyperedge Ej is a part of the 
matching represented by z. Thus Zj = 1 if Ej is selected, 
and Zj = otherwise. The matching problem has one 
constraint for each vertex: for a vertex Xi, the sum of Zj 
over those hyperedges j that cover vertex xi is at most 1; 
hence, at most one of these Zj takes value 1. Consequently, 
a vector z is feasible for the matching problem if and only if 
the collection {Ej : = 1} is a matching of "H. It follows 
that the matching number of % is the optimal value of (8). 

By a similar construction, in the integer linear program- 
ming formulation of the transversal problem, let each vertex 
Xi E X correspond to a variable Wi G {0, 1} and let 
w ^ {vji, . . . ,Wn). The variable = 1 if and only 
if vertex Xi is included in the transversal represented by 
w. The transversal problem has one constraint for each 
hyperedge which says that for a hyperedge Ej, the sum of 
Wi over those vertices i that are covered by Ej is at least 
1, whereby at least one of these Wi takes value 1. There is 
thus a one-to-one correspondence between a transversal of 
H and a feasible vector w for (9). The transversal number 
is thus characterized by (9). ■ 

Notice that the mathematical programs in (8) and (9) are 
duals of each other. A fundamental theorem of integer linear 
prograrmning states that a pair of dual programs satisfy 
weak duality. Weak duality means that of the pair of dual 
problems, the value of the maximization problem is no 
greater than the value of the minimization problem [39]. 
Applied to (8)-(9), this implies, for any hypergraph H, 

u{H) < t{H). (10) 
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We note a technical point about problems (8)-(9) that 
helps in simplifying our analysis. Notice that the constraint 
Zj e {0, 1} in (8) and the constraint Wi e {0, 1} in (9) 
may as well be replaced with the constraints zj & Z+ and 
Wi e respectively, where Z_|_ is the set of nonnegative 
integers, to give the following equivalent characterizations 
for iy{H) and t{H) 

v{H) = max{l^z| Az < l,Zj e Z+, 1 < j < m}, (11) 
t{H) = m.m{l^ w\A^w > l,Wi G 1+,! < i < n}. 

(12) 

To see the equivalence between (8) and (11), notice that no 
vector z G Z'J* satisfying Az < 1 can have a component 
greater than 1. And in (9), observe that no minimizing w G 
Z!Ji of (12) can have a component greater than 1. From 
now on, we consider only the formulations (11)-(12). Note 
that sources such as Berge [37] omit the above analysis and 
directly employ (11)-(12) to define vCH) and t('H). 

The linear programming relaxation of an integer program 
is constructed by replacing the requirement that a variable 
takes only integral values by a requirement that allows the 
variable to also take any real value between the integral 
values (i.e., in the convex hull of the integral values) [39]. 
By v*{'hL) and t*{H) we denote the values of the linear 
progranoming relaxations of (11) and (12), respectively, i.e., 

u*{U)=ma2^{l^ z\ Az<\,z>Q}, (13) 
T*{U)=m.m{l^w\A~^w>l,w>Q}, (14) 

where for simplicity, we denote a vector of zeros of 
appropriate size also by '0'. and t*{H) are called 

the fractional matching number and fractional transversal 
number of T-L. A vector z feasible for (13) is called a 
fractional matching and the set {2; : ^2 < 1,2 > 0} is 
called the fractional matching poly tope of H. A vector 
w feasible for (14) is called a fractional transversal and 
the set {w : A^w > > 0} is called the fractional 
transversal polytope. 1^ z and \ are called the weights 
of z and w. v* [T-L) and r* {T-L) being linear programs satisfy 
the fundamental property of strong duality [39], i.e., 

v*{n)^T*{n). 

Thus for any hypergraph the fractional matching number 
and the fractional transversal number are equal. In general, 
integer programs do not satisfy strong duality and thereby 
equality may not hold in (10). Equality or lack thereof in 
(10) depends on the shape of the fractional matching and 
fractional transversal polytopes. On a side note, we recall 
that linear programming relaxations have been employed in 
the decoding of binary linear codes by Feldman et al. [40]. 

Fractional matchings and transversals do not have as 
direct a counting interpretation as the vectors feasible for 
(8)-(9). However they are extremely useful for obtaining 



bounds. Since the feasible regions of the integer programs 
are strictly contained in the feasible regions of their of 
the linear programming relaxations, we immediately have 
v{n) < and t*{U) < T{n). Furthermore, we have 

the following lemma. 

Lemma 2.4: For any hypergraph Ti, we have 

v^U) < u*{U) = T*{U) < t{U). 

In particular, 

y{n) < T*{U) < I'^w, 

for any fractional transversal w. 

Proof: Since fractional matchings and transversal 
problems are relaxations of the matching and transversal 
problem, u{n) < v*{Wj and T*{n) < T{Wj. By the 
duality theorem of linear programming v*{T-C) and t*{T-L) 
are equal. By definition, any fractional transversal w must 
have weight no less than the fractional transversal number, 
by which the last claim follows. ■ 
We end this survey with one final concept, that of a line 
graph. 

Definition 2.8: A line graph of a hypergraph H = 
{X,£) is a graph L{'H) with vertices given by the hyper- 
edges of H and two vertices in L('H) are joined by an edge 
if they intersect as hyperedges in H. 
An independent set of a graph is a set of vertices, no two 
of which share an edge. For a graph G we denote the size 
of its largest independent set, or its independence number, 
by a{G). Now consider a hypergraph %. An independent 
set of its line graph i('H) corresponds to a collection of 
hyperedges of % that are pairwise disjoint. Consequently, 

v{U) = a{L{U)), (15) 

i.e., the matching number of a hypergraph equals the 
independence number of its line graph. 

III. NON- ASYMPTOTIC UPPER BOUNDS FOR 
SINGLE-DELETION CORRECTING CODES 

A. Hypergraph characterization 

The contents of this subsection apply to any s number of 
deletions. We will specialize to single-deletions and present 
our bounds in the following subsection. 

Consider the following hypergraphs. 

In each of these hypergraphs, hyperedges correspond to 
strings in F" and the vertices are strings in F^^^ and 
F^+'* for 7^°^ „ and U^^^^^, respectively. By Definition 
2.3, an s-deletion correcting code in F" corresponds to 
disjoint hyperedges in "H^^ „ and therefore corresponds to 
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a matching in "H^ s^„. The size of the largest codebook 
for string length n, is thus equal to 

the matching number of ^ „. The matching problem for 
„ when written explicitly, is as follows, 





maximize Y^yewn z{y) 




subject to 


z{y) e Z+, 





Here the integer variables are denoted z{y),y € F^. The 
constraints are that for each vertex x E F^"", the sum of 
z{y) over those y for which the hyperedge corresponding 
to y covers x (i.e., y G Is{x)) is at most unity. Since a 
code is an ,s-deletion correcting code if and only if it is 
an s-insertion correcting code, a matching of ^g,s,„ also 
corresponds to a s-deletion correcting code and thereby. 

Another characterization of the optimal codebook 
adopted in [19], [18], [1] employs the following graph. 

Definition 3.1: Let Lq,s,n be the graph with vertex set 
F^ wherein two vertices are adjacent if their Levenshtein 
distance is at most 2s. 

The optimal s-deletion codebook corresponds to the max- 
imum independent set in this graph. The Levenshtein 
distance (restricted to F^ x Fp is the shortest path metric 
on the graph Lq.i.n. The hypergraph characterization relates 
to this characterization through the concept of a line graph. 
Specifically, 

Lemma 3.1: For any q.s,n E N, the graph Lg,,,.,i is the 
Une graph of hypergraph 'Hq^s,n of hypergraph "Hj, „. 
Consequently, 

'^i'^q,s,n) = CtiLq,s,n) = |C*,s,„|, 
'^i'Hq,s,n) = Ct{Lq,s,n) = |Qs,„|- 

Proof: By the Definition 2.4 of Levenshtein distance 
and by Leiimia 2.1, two vertices in ig,s,„ share an edge if 
and only if their s-deletion (and s-insertion) sets intersect. 
Consequently, Lg,«,„ = HH^.^J = L{H\^^^„). By (15), 
the matching numbers of 'Hg^ „ and ^ „ are both equal 
to the independence number of Lq^s,n- ■ 
If one attempts to upper bound the size of a code by 
packing graph Lq,s,n with non-overlapping neighborhoods 
centered around strings in ¥q, the main difficulty encoun- 
tered is that the resulting neighborhoods are not of the 
same size. This property of the Levenshtein distance is a 
fundamental departure from, say, the Hamming distance 
under which the sizes of the neighborhoods are same for 
every string. 

Alternatively, one may pack F^~'* with deletion sets of 
strings in F^. This approach too encounters the difficulty 
that deletion sets are of different sizes. For example for 



s = 1, if one argues that 

\Cl,Jmm\D,{x)\< l^i(^)l < 

since mm^^fr. \Di{x)\ = 1, one gets the bound |C*_]^ „| < 
gr"-i which is far weaker than the asymptotic bound (the 
ratio qnjn{q-i) approaches infinity for large n). A similar 
situation results for s > 1. Levenshtein' s bound (2) is 
obtained by a refinement of this approach in which strings 
are classified in two categories based on their number of 
runs. 

Since insertion-correction and deletion-correction are 
equivalent, and since insertion sets are of the same size 
for each string of a given length (cf., (5)), one may exploit 
this to pack F"+* with insertion sets. Unfortunately, this 
leads to a weak upper bound. For example, for s = 1 we 
get the bound „(q_]^)_|_,^ : which is asymptotically q times 
larger than the known upper bound (this bound is for 
binary alphabet and the asymptotic size is ^). 

The approaches of packing deletion sets or insertion sets 
can be conceptually unified by casting them as matching 
problems on hypergraphs 'H^a.n and "H^^ „, respectively. 
Since insertion sets are of the same size, hypergraph T-Oq ^ „ 
is uniform [37]; indeed the matching problem is well 
studied on uniform hypergraphs (see e.g., [37, Chapter 
3], [41] and [42]). It is a quirk of the problem of deletion- 
correcting codes that although the characterization of C* ^ „ 
via 'Hq,s,n is analytically convenient and well studied, it 
leads to a weak bound. 

The other hypergraph 'Hg g „ is regular, since all vertices 
in 'Hg ^ „ have the same number of hyperedges covering 
them [37]. Although this hypergraph does not belong to 
a category where the matching problem appears to be 
well studied, we show in the following sections that, 
if appropriately tackled, it does lead to a better bound. 
The crux of the proof of our bound lies in tackling this 
hypergraph. 

B. The non-asymptotic upper bounds for single-deletion 
correcting codes 

In this section we present boimds on single-deletion 
correcting codes. The bounds we obtain are based on two 
concepts. The first is a monotonicity relationship between 
the number of runs of a string (recall Definition 2.1) under 
the operation of insertion. The second is the property that 
the size of the deletion set is also equal to the number of 
runs (cf (4)). We first note the monotonicity. 

Lemma 3.2: Let q,n E N and let x G F* be a string. 
Then for any supersequence y G Ii{x), the number of runs 
of X and y satisfy r{x) < r{y). 

This lemma is quite obvious; we omit the proof for brevity. 
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Our proof utilizes Lemma 2.4; for easy reference the 
fractional transversal problem of ^g,i,„ is written below 
explicitly. 





minimize J2r(=¥"-^ 


w{x) 




subject to 


w{x] 


> 1, 

> 0, 


Va; e F^-i. 



Notice that the variables are w{x),x e F^~^ and the 
constraint is that for any y G F^, the sum of w{x) over 
those X that are covered by the hyperedge corresponding 
to y (i.e., X E Di{y)), is at least unity. 

Theorem 3.1: Let g, n G N, g > 2, n > 2. The optimal 
q-ary single-deletion correction code C* ^ „ satisfies 



l^g.l.nl 



< 



Q 



(g-l)(n-l)- 



Proof: By Lemma 3.1, the size of the largest single- 
deletion correcting code equals the matching number of 
hypergraph n°i^„, i.e., iy{n°i^„) = |C* i^„|. By Lemma 
2.4, to show the required upper bound on i^('Hg_i.„) it 
suffices to construct a fractional transversal of 'H^ ^ „ 
with weight equal to ■ To this end, consider 

the fractional transversal w, where the component of w 
corresponding to string x G F^~^, denoted w{x), is given 
by 



w{x) 



1 



r(x) 



in— 1 



where r(x) is the number of runs of :>:. Clearly, w > 0. To 
show that w is indeed a fractional transversal, observe that 
for any y G F^, 



1 \Diiy)\ (6) J 
r(a;) ~ r(2/) 



The inequality in (o) follows from monotonicity relation- 
ship claimed in Lemma 3.2 and the equality in (b) follows 
from the size of the deletion set, given in (4). It only 
remains to calculate the weight of this transversal. For this, 
note that the number of strings of length n — 1 with exactly 
r runs is q{q - ly-^ X (^ij). This is because, we have 
q choices for the symbol of the first run and for every 
subsequent run we have g — 1 choices for its symbol. The 
number of choices for the lengths of the runs equals the 
number of integral solutions {ti,...,tr) to 



= n — 1, 



ti> l,l<i<r, 



which, by Lemma 2.2, is ("_^). Consequently, the weight 
of w is 



r=l 
n-1 



r — 11' r 



(n-2)! 



r-l 



(c) 



^ (n — r — l)!(r — l)!'r' 



(g-l)(n 

_ g((l + (g-l))"-i-(V)) 

(g-l)(n-l) 
^ g" - g 
(g-l)(n-l)- 

In (c), we have simpUfied -r = 

^ ^n-r-i)\r\ - Lemma 2.4, ^qS)~(n-i) ^ "PP^r 
bound on |C*.i_„|. ■ 
Although this bound is non-asymptotic, as a corollary we 
get the asymptotic results of Levenshtein [2] and Tenengolts 
[5]. 

Corollary 3.2: The optimal single-deletion correcting 
code for binary alphabet has size that asymptotically satis- 
fies 

The optimal single-deletion correcting code for g-ary al- 
phabet satisfies 



l^q,l,nl 



< 



(g-l)n- 



Proof: For binary alphabet, Levenshtein [2] shows that 
the VT codes correct single deletions. These codes are of 
size at least whereby |C2_i^„| > Combining this 
with Theorem 3.1 shows that 

on on o 

< iq,„l < 

Thus ^^^^j^ A 1. For the g-ary case, since by Theorem 

3-1. IQl.nl < (g-'l)(n-l)' lll^n^-oo i) < 1- ■ 



IV. NON- ASYMPTOTIC UPPER BOUNDS FOR 
MULTIPLE-DELETION CORRECTING CODES AND THE 
ASYMPTOTIC RATE FUNCTION 

We now extend the logic used in the bound above to 
channels with multiple deletions. 

And as we did in the single-deletion case, we will use the 
hypergraph "H^ to obtain our bound. The key property 
employed in the proof of Theorem 3.1 was that the number 
of runs of a string increases under the insertion of a symbol. 
This is in fact a specific consequence of a more general 
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property shown by Hirschberg and Regnier [26, Lemma 
3.1]: for any s, the size of the s-deletion set of a string 
increases under the insertion of a symbol. This result is 
articulated in the following lemma. Here if a; = xiX2 ■ ■ - Xn 
and y = yiy2 ■ ■ - ym are g'-ary strings, 'xy' denotes the 
string xiX2 ■ ■ ■ Xnyiy2 ■ ■ ■ Vm- 

Lemma 4.1: Let s G N. For any strings a;, y G F* and 
any symbol a G Fg, \Ds{xy)\ < \Ds{xay)\. 

The original result from [26, Lemma 3.1] seems to 
pertain to nonempty strings x, y; this is apparent from their 
proof. However the extension to the case where one of x,y 
is empty is trivial and we have included it in the above 
statement. The consequence is that, in this lemma, a can 
be thought of as a symbol inserted into an existing string 
xy. A recursive application of Lemma 4. 1 then immediately 
yields that for any s and any string x E W^, 



\Ds{x)\ < \D,iy)\. yyel,{x). 



(16) 



Looking back at the size of the single-deletion set from (4), 
one sees that the monotonicity relationship of Lemma 3.2 
is a special case of (16). 

We now exploit (16) to give an upper bound on the 
size of an s-deletion correcting code for arbitrary s. The 
proof utilizes, as before, the fractional transversal problem 



*(^9,s,n)= mimmze T,a=^v^-' H^) 



subject to 



ix)>l, V?yeF^, 
(x) > 0, VxeF 



n— s 
9 



Theorem 4.1: Let s,q,n G N such that n > s,q > 2. 



The optimal s-deletion correcting code C* ^ „ satisfies 



Ks,n\ < 



E 



\Dsix) 



(17) 



Proof: We construct a fractional transversal for Hgg „. 
Consider the candidate fractional transversal w, such that 
for any x G F^~*, w{x) = jjy^j^^- Obviously, w > 0. 
Furthermore, for any y G F^, 



y — 



(a) 
> 1, 



where (a) follows from the monotonicity relation (16). 
Thus w is indeed a fractional transversal of g n- Now 
by Lemma 2.4, the weight of w is an upper bound on 
'^i^q,s,n) = \^q,s,n\' whereby the result follows. ■ 
In order to derive explicit bounds, we now discuss the 
sizes of s-deletion sets. For s < 5, Mercier et al. [28, 
Section III.D] give closed form formulae for the size of s- 
deletion sets, which unlike in the single-deletion case, have 
quite a complicated form. Closed form expressions for 2- 
deletion sets for binary alphabet are also given by Swart and 



Ferreira [27] and Sloane [1]. The only results on deletion 
sets vaUd for arbitrary s are bounds. For all x G F", the s- 
deletion set of x admits the following lower bound, shown 
recently by Liron and Langberg [29, Theorem VI.2]. For 
any s < n and any string a; G F^ with 2 < r{x) < n, 



\D,{x)\>S{r{x),s) 



min(s — 2,r(a:) — 3) 

E S{r{x)-2,i) 

i=s-\-r{x)—n—l 



where 5(r, s) = < 



E:=o(T)' r>s>0, 
1, s = r>0, 

0, s < or s > r. 



(18) 



(19) 



Notice that this bound on |-Ds(-)| is always positive. 
Additionally it is an improvement on previous bounds of 
Levenshtein [25] and Hirschberg and Regnier [26]. 

By using the explicit formulae (e.g., [28], [27], [1]) for 
the sizes of s-deletion sets in (17), one may obtain explicit 
upper bounds on |C* g „|, for s < 5. For general s, we 
derive an upper bound on the right hand side of (17) by 
combining Theorem 4.1 with the lower bound in (18). Note 
that the explicit formulae will yield tighter bounds than the 
one below. 

Corollary 4.2: Let s,q,n G N, g > 2, n > 2s. The 
optimal s-deletion correcting code C* ^ „ satisfies 



g,s,n\ — Uq^s,ni 



where 



r=l ^ ^ 



(20) 



and 6{-, •) is as defined in (19). 

Proof: By Theorem 4.1, we have 

1 



Ks,n\ < E 



E 



\D,{x)\ \Ds{x)\ 

For n — s>s and strings x G F^"** such that r{x) > 3, the 
bound in (18) applies; furthermore, notice that for such x, 
the bound in (18) is strictly positive. So using (18) in the 
equation above, the first sum can be upper-bounded and 
the resulting bound is the first term in (20). The second 
sum in the equation above admits the trivial upper bound 
|{a; G F^-''|r(a;) < 2}|, which is the second term in (20). 
Hence the bound. ■ 
One of the aims of this paper was to produce non- 
asymptotic upper bounds that imply known asymptotic 
bounds. We now show that the bound Uq_s,n meets this 
purpose. Our main result is that C/^.s.n (and the expression 
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SxeFj-" \D \x)0 ™pliss the previous results of Leven- 
shtein [2] stated in (1) for q = 2, and generalizes these 
results to q-ary alphabet. 

In order to do this, we first show a lower bound on (the 
upper bound) Uq^s,n- For this we recall an upper bound 
on sizes of deletion sets due to Levenshtein [25]: for any 
n, g e N, 

|i5.WI< ('^''^^/"^), Va^eF^. (21) 

Lemma 4.2: Let q,s,n G N, n > 2s,q > 2. The upper 
bound Uq^s,n satisfies the lower bound 



Uq,s,n ^ ^ ^ 



1 ^Q-'-iErMM-^nv) 



xev" 



\Ds{x)\ 



Fix r' & N, 1 < s < r' < n — s. We first claim that 
Uq,s,n satisfies 



q{q-ir-'("-r,') 



S{r',s) 



+E*-ir'(";:r)- 

r=l ^ ^ 



(22) 



To see this, use (19) to conclude 

min(s— 2,r— 3) 

6{r,s)+ ^ d{r-2,i)>6{r,s)>S{r',s), 

i=s+r—{n—s) — l 

for any r > r', and thus bound the terms in (20) cor- 
responding to r > r'. For terms corresponding to r < r', 
employ the trivial bound d{-,-) > 1. Eq (22) further imphes 



Proof: The first inequality on the left follows from the 
proof of Corollary 4.2. To show the second inequality, use 
the upper bound on |I?s(')l from (21), to get that the sum 

^ 1 _'^ q{q-ir-Trl^') 

2^ /r{x)+s-l\ 2^ /r+s-l\ 
xeFj-' \ s ) r=l \ s J 

_ g"-gE;U(g-ir(";0 

In (a) we have used that ^^r+s-i^ = • This proves 

the claim. ■ 
Notice that the above calculations are a generalization of 
our proof of the bound on single-deletion correcting codes 
in Theorem 3.1. 

We now prove the asymptotics of Uq^s,n by deriving a 
matching asymptotic upper bound. 

Theorem 4.3: Let q, s E N,q > 2. The upper bound on 
s-deletion correcting codes Uq^s,n satisfies 



Uq,s,n'^ in J 



\Ds{x)\ {q-iy 



as n ^ 00. Consequently, as n ^ oo, 

\C* I < — — . 

Proof: Thanks to Lemma 4.2, to prove the first set of 
asymptotics, it suffices to show that ?7g,s,n ^ {q-i)'>n^ ^ 
n — >■ 00. 



o(r , s) 



n — s — 1 

r- 1 



(23) 



Consider a binomial distribution with parameters {n—s— 1) 
and . The Chemoff bound on the cumulative binomial 
distribution implies that for r' — 1 < ^^{n — s — 1), the 
sum X;j:=/ qiq - 1)''"^ is no more than 

/ 1)2=1 _/_2)2\ 



exp 



22^(n-s-l) 



Setting r' = r = 

— s — 1) log(n — s — 1) 
bound and the fact that 6{r,s 
we get 



1) 



9-1 

— (n — s 
in (23), using the Chemoff 

) - s!(2=i)«n", as n 



oo. 



(g — 1)^71" ' 



as n — >■ oo. Combining this bound with Lemma 4.2, we 



get Uq^s,n ~ SiceFJ^^ \Ds{x)\ 

Corollary 4.2, we get |C,%,„| < 



5ig 

{q-ir 



r. Finally, by 



slq 



-l)-'n« 



Note that in addition to clarifying the asymptotics of 
Uq.s,n the above theorem shows that using explicit formulae 
for l-DsCOl in (17) does not lead to any improvement over 
Uq^g^n in an asymptotic sense. 

Notice that the right hand side in (23) closely resembles 
the expression in Levenshtein's bound from (2). In fact 
Levenshtein's expression in (2) contains the term '("7^)' in 
place of ' ("~*~^) ', and therefore appears to be weaker than 
(23). However this observation does not directly translate to 
a proof that our bound ?7g,s,ra is stronger than Levenshtein's 
bound. This is because the parameter r in (2) is allowed to 
vary between s — 1 and n — 1, whereas in (23), r' is allowed 
to vary between s — 1 and n—s. If one could make the 
deft argument that for any n, s, values of r in (2) beyond 
n — s are inconsequential to the comparison of (2) with 
Uq,s,n^ one could establish that Uq^s,n is indeed a better 
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bound than Levenshtein's. We have empirically found that 
this is true; we discuss this in Section VI. 

Finally, it is evident that the bound Uq,s.n, while explicit, 
is hard to reduce to a closed form for any s 1. It appears 
that the single-deletion case is a unique one which allows 
for a neat calculation of a closed form expression. 

A. The asymptotic rate function 

Consider the case of a deletion channel where a fraction 
T e [0, 1] of the symbols in a g-ary string are deleted. 
Denote by Rqir) the asymptotic value of the rate of the 
largest code for this channel. 



Rqir)^ lim -logJC: 



q \^q,Tn,n I 



(24) 



We call -Rg(r) the asymptotic rate function for the deletion 
channel. Very little seems to be known about this function. 
Levenshtein's non-asymptotic bounds from (2) only lead 
to the conclusion R2{t) < 0.7729 for r > 0.0757 [6]. In 
this section we show that our non-asymptotic bound Uq_s.n 
from Corollary 4.2 allows for a calculation of a finer bound 
on Rq{-). 

In order to perform this calculation, we need to address 
some technicalities. Notice that Corollary 4.2 assumes 
n > 2s to obtain the bound U, 



q,s.7l- 



When s was fixed, this 
restriction was immaterial. But for s ~ rn, this restriction 
means that Corollary 4.2 can be used only for r < ^. For 



T > 



we will use the trivial bound 



< 



y — 



(25) 



Denote by hq{x),x E [0, 1] the following function 

hq{x) = -X \0gq{x) - (1 - X) log^(l - x) + X \ogq{q - 1), 

and let h{-) = h2{-), denote the binary entropy function. 

Theorem 4.4: Consider the asymptotic rate function 
Rq{-) defined in (24). For r G [0, 5), the asymptotic rate 
function satisfies 

Rq{T)< max N{p;t)-D{p;t), 



pe[04— r] 



where 



N{p-T)^{l-T)hq 



1 - T 



D[p] t) = max 



(p_^)/,(niin(^),i) 



log2 9 



max(2T + p - 1,0). For r e [^,1], the 



and rrir.p 
asymptotic rate function satisfies 

Rq{T)<{l-T). 




Fig. 1: The upper on the asymptotic rate function Rqir) 
guaranteed by Theorem 4.4 for alphabet sizes q = 2, . . . ,5 
and r e [0, i). 



The proof is standard, but messy. We have relegated it to 
the Appendix. 

Some remarks about this bound on Rq (r) are worth not- 
ing. Fig 1 contains plots of this bound pertaining to various 
alphabet sizes for T e [0, i). For r = 0, D{p;t) = and 
hence Rq{0) < maxpgjo.i] hq{p) = 1, which is expected. 
Thereafter for small values of r (say t < 1/10), one finds 
that the rate drops quite sharply. For r > i, the above 
bound says Rqir) < 1 — r and so Rq{l) = 0, as expected. 
One can easily see that this bound on the rate function is 
superior to Levenshtein's from [6]. 

However there are obvious shortcomings to our bound. 
Notice in Fig 1 that our bound never hits zero for any 
T G [0, i); in fact it becomes zero only for t — 1. 
Independently of his bound, Levenshtein [6] argues that 
Rqir) must be zero for all r > Our bound does 

not imply this property (Levenshtein's bound on the rate 
function also does not imply this property). Furthermore, 
in each of the plots in Fig 1, our bound shows an increase 
beyond a certain value of t. The true asymptotic rate 
function Rqir) must decrease monotonically with r. This 
indicates that our bound becomes vacuous after a certain 
value of T. 

A fascinating lesson in this is that a non-asymptotic 
bound such as Uq^s,n that yields good asymptotics in one 
regime may not necessarily do so in other regimes. 

V. Bounds on codes for constrained sources 

The bounds obtained in the previous sections pertain 
to sizes of codebooks for the set of all strings of a 
particular string length and from a particular alphabet. We 
now consider the case where a codebook is sought for 
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a constrained set of source strings in and extend the 
results obtained above to present bounds for such codes. 

Definition 5.1: Let 5 C be a set of strings and s G N. 
An s-deletion correcting code or s-deletion codebook for 
S, is a subset CCS such that the sets Da{x),x G C, 
are pairwise disjoint. The largest such code is denoted Cg ^ 
and called the optimal s-deletion correcting code or optimal 
s-deletion codebook for S. 

Finding a bound on the optimal codebook for an arbitrary 
set of strings S is significantly more challenging than 
finding one when S = F^. Specifically, arguments such 
as those based on Stirling's approximation employed by 
Levenshtein [2] and Tenengolts [5] rely on the availability 
of all strings in F^. 

We construct our bound by using a suitable hypergraph. 
Let 5 C F^ and define the hypergraph 

nl,^{D,{S),{D,{x):xeS}), 

where Ds{S) = {J^f^g Ds{x). "H^ , is the partial hyper- 
graph of "Hg s „ generated by S. By arguments similar to 
those previously used, it follows that i^CHg g) = \Cg J. 
This matching problem for 'Hg ^ can be explicitly written 
as follows. 





maximize J2yes ^iv) 




subject to 


z{y) e Z+, 


Va; G D,{S), 

yycs. 



Notice that in the constraint, the sum is over y belonging 
to Is{x) n S; this is because there may be a case where for 
some X G Ds{S), not all strings in Is{x) are present in S, 
and may thereby not correspond to a hyperedge in "Hg j,. In 
the language of graphs, the codebook ^ is a maximum 
independent set in Ls,s, the subgraph of Lq,s,n induced by 
strings in S. As before, it is easy to see that Lg^g is the 
line graph of g. 

In constructing our bound we exploit the "decoupling" 
afforded by the fractional transversal problem for Hg g. 
This problem can be expUcitly written as follows. 



7"*(^s,J= minimize 'E^f.D,{s) ^i^) 

w{x) > 0, Va; e D^iS). 



subject to 



In this problem there is a separate constraint for each hy- 
peredge, i.e. for each string in S. Consequently, a fractional 
transversal can be constructed for T-Lg^ for any set S by 
applying the logic used in Theorem 4.1. 

Theorem 5.1: Let s, n G N, n > s and let 5 be a set 
of strings in F^\ Then 

[■^l < ir* I < ^ 

(n+s-l\ ^\<-^S,s\^ 2^ \D ixW 



Proof: Notice that the fractional transversal problem 
for contains a constraint for each string y belonging 
to S and the sum in this constraint is over all x G Ds{y). 
Consequently, following Theorem 4.1, we see that w{x) = 
Ydj^\,x G Dg{S), is a fractional transversal of V.g g. The 
upper bound thus follows. 

To obtain the lower bound consider the line graph Ls^s of 
Hg g. The maximum independent set in Ls,s is the optimal 
matching of "H^ ^ and thereby the largest codebook Cg A 
well known bound given by Brook's theorem or a "greedy" 
algorithm for independent set construction [35] gives that 

«(i.„) = |C5„l>sp|L_, 

where A{Lg^s) is the maximum degree of a vertex in 
Lg^s- The neighborhood of a vertex x in Lg^s com- 
prises of those strings obtained from x by deletion of 
s symbols in x followed by the insertion of s symbols 
in the resulting subsequence. Consequently, A{Lg^s) < 



D^iS)\Dsix)\\Is{y)\-l < C 



n+s-l\ 



1, 



(26) 



where we have used the upper bound on |-Ds(-)| from (21), 
s „ was defined in (6) as the size of the insertion set for 
strings in F^"**, and the subtracted 1 is because the string 
itself is counted at least once while counting neighbors 
produced by deletion and insertion. The result follows. ■ 

A. Run-length limited sources 

In this section we will demonstrate the idea above 
by applying the results of Theorem 5.1 to the specific 
application of run-length limited codes. For simplicity we 
consider only the single-deletion case; but the idea is more 
general and can be extended readily to larger number of 
deletions. The background on these codes is sourced from 
the book chapter by Marcus, Roth and Siegel [43] and their 
extended monograph available onUne [44]. 

Recordings on a magnetic tape when encoded into a 
binary string result in strings that have no adjacent I's 
and the number of O's between two consecutive I's is 
constrained to be in a certain range. Let Q < d < k. A 
binary string is said to satisfy a (d, fc)-run-length limited 
(RLL) constraint if a) the string contains no adjacent I's, 
i.e., the length of any 1-run is unity, b) the first and the last 
runs are 0-runs and c) the length of any 0-run is at least d 
and at most k [43]. In [44], the first and the last runs of O's 
are allowed to have lengths less than d. In this section we 
assume, mainly for simplicity, that in a (d, fc)-RLL string, 
the first and the last runs of the string must be 0-runs also 
having length at least d. 

The problem of correcting errors in RLL strings has been 
considered by several authors (see [44, Chapter 9.5]) but 
most of these works consider erasure error or substitutions 
(see [22] and the discussion therein). Most works that 
consider deletion, consider the deletion of O's only, since 
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that is most relevant to the application (see, e.g., the 
discussion in [17]). Recently Cheng et al. [23] and Paluncic 
et al. [24] have considered deletion errors in RLL strings 
for deletion of O's and I's. 

Assume that a set of RLL strings as defined above are to 
be transmitted through a single-deletion channel, wherein 
both O's and I's can be deleted. In the theorem below 
we derive a bound on the size of the largest codebook 
for a {d, oo)-RLL set of strings. For Q < d < k, hy 
Sn{d,k) C F2 we denote the set of binary strings of 
length n satisfying the {d, fc)-RLL constraint. First, we 
characterize Di{Sn{d,oo)). 

Lemma 5.1: Let n, d e N and 1 < d < n. Then we 
have Di{Sn{d,'xi)) = S'„_i(d, 00) U 5'^_i(d, 00), where 
S[^_i{d, 00) is the set of binary strings of length n — 1 such 
that the first and last runs are 0-runs, between exactly one 
pair of consecutive I's there are exactly d — 1 number of 
O's and between all other pairs of consecutive I's there are 
at least d O's. 

Proof: "C": Consider a string in Sn{d, 00). A deleted 
symbol must be a or a 1. 

1) If a is deleted there are two possibilities: either 
the run from which it is deleted has length d, or it 
has length > d. In the former case, the subsequence 
lies in S'j^_i{d, oo), while in the latter case, it lies in 
5„_i(c?,oo). 

2) If a 1 is deleted, the 0-runs adjacent to the deleted 1 
join to form a longer run of length at least 2d\ the 
subsequence thus lies in S'„_i(d, 00). 

This shows that in either case, £>i(S'„(rf, 00)) C 
Sn-i(d,co) U S''j_i(rf, 00). 

"3": To show the opposite inclusion, it suffices to show 
that for any string x G Sn-i{d,oo) U S'^_i{d,oo) there 
exists a string y E Sn{d, 00) such that y G Ii{x). Consider 
an arbitrary x G 00) U S'^_i{d,oo). Insert a in 

the shortest 0-run of x and call the resulting string y. Since 
X has at most one 0-run of length d — 1, it follows that y 
lies in Sn{d, 00). ■ 

Using this lemma and Theorem 5.1, we will prove an 
upper bound on the size of a code for Sn(d, 00). 

Theorem 5.2: Let n,d € N, 1 < d < n. The optimal 
codebook for Sn{d, 00), Cg^^^ satisfies 

1^* ,^^{n-2-r-{d-l){r + l)\ 1 



r=0 



■E(-+i)C 



where r ~ \ and r' 



-(d-l)(r + l)\ 1 
r-l /2r-|-l' 
(27) 

I I 
Ld+i J- 



Proof: From (26) and the size of single-deletion 

S„(d,oo)l 



sets stated in (4), |C*^^^,^J < EcceDi(s„(d,oo)) vk' 



Lemma 5.1, £'i(S'„(d, 00)) = Sn-i{d, 00) U S'^_i{d, 00). 
Notice that by definition of S'^_i{d, 00), the sets 
00) and S'^_i{d, 00) are disjoint. Therefore, 

NT- 1 ^-^ 1 



\C*s 



S„{d,oo) 



< 



^ r{x) 

xeSn-l(d,Oo) ^ ' 



E 



a;65^_i(d,oo) 



r{x) 



■ (28) 



Since all 0-runs of a string in 5„_i(c?, 00) have length at 
least d and all 1-runs have unit length, and the starting and 
ending runs are 0-runs, any string in S'„_i(rf, 00) has an 
odd number of runs and at most 2f + 1 runs, where f is 
as stated in the theorem. Therefore a string in Sn-i{d, 00) 
with, say 2r -|- 1 runs, has r 1-runs of unit length and r + 1 
0-runs of lengths say £1, . . . , f.r+i, where each £i > d. The 
number of strings with 2r + 1 runs in Sn-i{d, 00) is thus 
equal to the number of integral solutions {ii,. . . , ir+i) of 



r+l 



'^ii = n-l-r, 



>d,l<i<r + l. 



By Lemma 2.2 this number is (n-2-r-(d-i)(r+i)^^ 
whereby the first term in the right hand side of (28) equals 
the first term in the right hand side of (27). 

Each string in S'„_i{d, 00) also has odd number of runs. 
Furthermore, it has at least three runs and at most 2f' + 1 
runs, where f' is defined in the statement of the theorem. 
Consider a string with 2r -|- 1 runs with r 1-runs and r + l 
0-runs. First choose the 0-run with length d—1; this can be 
chosen in r + 1 ways. Let ii,. . . ,ir be the lengths of the 
remaining 0-runs. The number of choices for the lengths of 
the remaining runs is the number of integral solutions of 



1 



id- I), 



L> dA<i<r. 



Using Lemma 2.2, the number of strings in S'^_i{d, 00) 
with 2r + l runs is thus (r + i)("-2-'-(d-i)'-(d-i))_ 
proves that the second term in (27) equals its counterpart 
in (28). ■ 
Unfortunately, calculating these bounds in a simplified 
closed form does not appear to be easy. Our aim in this 
section was only to demonstrate the idea and the bound in 
Theorem 5.1. Exact calculation of these bounds is beyond 
the scope of this paper. 

With this we conclude the theoretical portion of the 
paper In the following sections we will study how our 
bounds compare numerically with the sizes of known 
codebooks and with other boimds. 

VI. Numerical results 

Recall that the upper bounds guaranteed by Theorems 
3.1, 4.1 and 5.1 were obtained by constructing a fractional 
transversal for the hypergraphs involved. To obtain an upper 
bound on the size of optimal codebooks for the deletion 
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96 
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204 
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12 
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316 


13 


1182 


682 


593 


586 


14 


2232 
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1104 
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(a) q = 2, binary 
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[Lev-UBj 


1 1 
L(„_i)(q_i)J 
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62 


46 
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105 


8 
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(h)q = 


3 




n 


[Lev-UBj 


L(„_i)(<,_i)J 


[lp-ubJ 


|Tenengolts| 
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1 
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12 


10 
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28 
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20 


5 
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69 


52 
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178 






(c)5 = 


4 
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[Lev-UBj 


1 1 


[lp-ubJ 


|Tenengolts| 


1 


1 




1 


1 


2 


7 


5 


5 


3 


3 


17 


15 


11 


9 


4 


67 


51 


45 


33 


5 


293 


195 


158 


129 


6 


1146 


781 


657 


527 



(d) <7 = 5 

TABLE I: The columns of the table show, from left to 
right, the value of Levenshtein's bound from (2) (Lev-UB), 
values of upper bound obtained in Theorem 3.1, the 
fractional matching number J^*('Hq^i^„) (LP-UB), and the 
sizes of best known codes, for values of q and n. For 
binary alphabet, the best known codes are the Varshamov- 
Tenengolts codes VTo(n) [3], [2]. For larger alphabet, the 
best codes known to us are those of Tenengolts [5], whose 
size is denoted |Tenengolts|. 



channel, it suffices to find the fractional matching number 
itself, and ideally one would like to have an expression for 
this number We were not able to find such an expression 
and constructed a fractional transversal as a proxy for it. 

In the case of a single deletion, there already exist codes 
which are known to be asymptotically good. This moti- 
vates a comparison between our bound for single-deletion 
correcting codes, the fractional matching number and the 
sizes of the best known codes in order to ascertain the 
quality of these codes. To do this, the fractional matching 
problem for hypergraph "H^ ^ „ (for single deletions) was 
solved numerically on Matlab for various values of q and 
n. Table 1 documents the results obtained. 

In each subtable of Table I, the columns contain from 
left to right, the string length n, Levenshtein's upper bound 
(strongest one from (2); denoted Lev-UB), the bound from 
Theorem 3.1, the value of the fractional matching number 
found numerically {— v*{T-L^y^)\ denoted LP-UB), and 
the best known code for each case. In the binary case 
the best known code is the Varshamov-Tenengolts code 
VTo(n) where 



ixi = a mod n -|- 1 1 . 



VTo(n) is also conjectured [1] to be optimal for all n. 
For larger alphabet the best codes we know of are those 
of Tenengolts [5] (these are denoted | Tenengolts]). For 
each q the largest value of n is as far as we could compute 
with the resources available to us. 

The first trend noticeable is that in any row values de- 
crease from left to right. Thus the strongest of Levenshtein's 
bounds from (2) is weaker than our non-asymptotic bound. 
Our non-asymptotic bound is also weaker than the value 
of the fractional matching number (column LP-UB); this 
shows that the fractional transversal we have constructed 
to obtain the upper bound is not the optimal fractional 
transversal. 

Notice that in the binary case, shown in Table la, the 
size of the Varshamov-Tenengolts code VTo(n) shows a 
good match with with LP-UB. This indicates that these 
codes are either optimal (as conjectured) or close to being 
optimal, at least for n < 14. Sloane's website [4] carries 
numerically obtained bounds for n < 11, of which VTo(n) 
has been confirmed as optimal for n < 10. The bounds on 
the website have been obtained by computing the Lovasz 
d [35] on graphs Lq i n. The results in Table I may be 
considered as additions to Sloane's compilation. 

For each value of q, n, Tenengolts' construction gives a 
two-parameter family of codes (the parameters being /3, 7 
in [5, Eq (2)]). The column | Tenengolts | contains for 
the respective q,n, the largest code out of this family. 
Unlike in the VT codes where it is known that of the 
family VTa(n),a = 0, ...,n, the code VTo(n) is the 
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Fig. 2: Figure showing values of Uq^s.n (solid lines) and 
Levenshtein's bound (dotted lines) from (2) for q — 2, 

s = 2,3,4 and 15 < n < 30. 



largest, we are not aware of a similar characterization of 
the largest code from Tenengolts' family. Thus the column 
I Tenengolts | was populated by explicitly calculating 
the size of the code for each value of the parameters and 
thereafter identifying the largest of those. It is clear from 
this table that these codes are quite smaller than the frac- 
tional matching number in LP-UB. This may mean either 
that there is a large gap between the fractional matching 
number and the matching number for these hypergraphs, 
or that the Tenengolts codes are not optimal. 

For larger number of deletions there exist no good 
codes apart from those found by search. So no interesting 
comparisons can be made for an existing code for a larger 
number of deletions. However, we may compare our bound 
with Levenshtein's from (2). Figure 2 shows the comparison 
for binary alphabet and s = 2,3,4 and 15 < n < 30. 
We have focused on this region of n so as to allow the 
distinctions between the lines for s = 2, 3, 4 coming from 
Levenshtein's bound to be clearly discerned; for smaller 
values of n these lines overlap. One can easily eye-ball 
that our bound is significantly better than Levenshtein's. 

We discuss the quality of our bound and prospects for 
improving it in the next section. 

VII. Discussion 

For the sake of this discussion, we limit ourselves to 
the case of the single-deletion channel. Table I shows 
that there is scope for improving our bound {^q^y^n-i) 
for the g-ary single-deletion channel. Since the bound is 
not equal to the fractional matching number LP-UB, one 
can obtain a better bound by merely finding a fractional 



transversal with a smaller weight. However, in practice 
a construction to this effect has eluded us. In fact, our 
constructed transversal shows a close match to the optimal 
fractional transversal found numerically, which makes any 
improvement challenging. We discuss this below. 

Figure 3 shows the optimal fractional transversal and the 



fractional transversal we have constructed {w{-) 



r(-) 



) for 



hypergraph i n' i-^- q — 2,n — ^ and s — 1 and for 
hypergraph Ti-^i^ (q ~ 5,n = 4, s = 1). Notice that in 
both cases, the constructed fractional transversal matches 
the general trend of the optimal fractional transversal. This 
continues to hold for larger values of n. Indeed, in the 
binary case, since 



2"- 



0, 



the average difference between the constructed and op- 
timal transversal vanishes for large n. A tighter bound 
may be obtained by fine-tuning the constructed fractional 
transversal, but since the general trend of the optimal 
fractional transversal has already been captured by our 
constructed transversal, the logic for further fine-tuning is 
not obvious. Yet, this effort is not a lost cause: since the 
number of vertices grows exponentially, a small saving in 
this construction may imply a substantial improvement in 
the bound. 

We end with one final consideration and speculate on 
what may be an alternative approach to obtaining better 
bounds. Since the most successful approaches to code con- 
struction for this problem have been number-theoretic one 
may be inclined to conjecture that the size of the optimal 
codebook |C* „ | depends not only on the numerical value 
of n, but also on properties n has as a number. In the 
binary case, in particular, since the fractional matching 
number i'*{'H^i „) closely tracks |VTo(n)|, which is given 
by a number-theoretic formula (see [1, Eq (7)]), it appears 
that v*{T-L2\ n) ^Iso be given by a number-theoretic 
expression. In contrast, neither our bounds nor their proofs 
have any number-theoretic character Perhaps a clue to 
tightening these bounds lies in giving a number-theoretic 
construction of the optimal fractional matching or a better 
(possibly optimal) fractional transversal. 

In summary, this paper considered the deletion channel 
for general q-ary alphabet and an arbitrary number of 
deletions and proved new non-asymptotic upper bounds on 
the sizes of the optimal codebooks. The bounds are stronger 
than known bounds and imply classical asymptotic bounds. 
The bounds were derived via a hypergraph characteriza- 
tion of the optimal codebook and a linear programming 
argument. The approach was extended to derive bounds 
on codebooks for general constrained sources and was 
demonstrated for run-length limited sources. The paper 
concluded with a discussion on numerical results and on 
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the quality of these bounds. 
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Fig. 3: The horizontal axis consists of elements of 
and F|, respectively, plotted in increasing order of their 
decimal value. The vertical axis is the value of the frac- 
tional transversals. In each case, the dotted line shows the 
optimal fractional transversal and the solid line shows the 
constructed fractional transversal 'w{x) = for 7^2 18 
and I 4, respectively. These lines are provided to aid in 
discerning the trends in their values; they have no meaning 
per se. 



Appendix 
Proof of Theorem 4.4 

Proof: First consider r € [0, i^). 
For such a value of r, the bound (20) applies. By (20), 



(l-T)n 



E 



= (2t-1) 

,_i/^(l-T)n-l 
r - 1 



Notice that the second sum being a mere polynomial in n 
can be ignored in comparison to the first sum. Below, we 
focus only on the first term and estimate its asymptotics by 
finding its exponent. 

Put r = pn so that p E [0, 1 — t], and let 

N{p;t)^ lim -log,g(g-ir-ip"^^''~^Y 
n-j-oo n ^ \ pn — 1 J 

Di(/5;t)= lim -log S{pn,Tn), 

^ min(rn— 2,pn— 3) 

D2{p\t)^ lim -log V 5{pn~2,i). 

n— foo Ji — 



j:=(2r-l+p)n-l 



Here N{p;t) is the exponent of the numerator and the 
exponent of the denominator is 

D{p]t) = ma.x{Di{p;T),D2{p;T)). 

Therefore, the asymptotic rate function satisfies 

i?,(r)< max iV(p; r) - r). 

0<p<l— r 

We now calculate the above exponents. It is easy to see 
that 

Nip;T)^il-T)h,(^^y 

which is as required. Next consider Di{p;t). Clearly, if 
P<T, Di{p; r) = 0. If r < i.e., p > 3t, 



h 



D^{p-t)^{p-t)- 



On the other hand if p < 3r, Di{p\t) 
summary, we get 



log2 q ' 



In 



Di{p;t) = I{p>r} 



'(p-r)/i(min(^,i)) 



log2g 



Now consider D2{p;t). Recall from (19) that if i < 
0,5{pn — 2,i) = 0. In the expression for D2{p;t), put 
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i = fm, so that /x G [max(2T + p — 1, 0), niin(T, p)]. Then 
arguing as above, we get 



£>2(p;t) = max 

mT-,p<M^rnin(T,p) 



where mr,p = max(2T + p— 1, 0), as stated in the theorem. 

We now show that D2{p; r) dominates Di{p; r) for any 
p,T. If p < t,Di{p]t) = 0, so, clearly, D2{p;t) > 
Di{p;t). However, if p > t, we find that p, = t 
satisfies p G [mr,p,TXim{T, p)]. To see this, observe that a) 
min(T, p) = T, since p > t, and b) r > mi-,p if and only 
if p < 1 — r, which is the assumed range on p. But for 
/i = r the value of the maximand above equals Di{p;t). 
Consequently, D2{p;t), which involves a maximization 
over p., dominates Di(p;t). In summary, 

D{p;t)=D2{p;t), 

as required. This completes the first part of the theorem 
pertaining to r e [0, 5). 

Now consider t > | and use the trivial bound from (25). 
In this case, clearly, 

i?,(r)< (1-r). 
This covers all cases and the proof is complete. ■ 
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